{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# JPX Tokyo Stock Exchange Kale Pipeline\n",
    "\n",
    "In this [Kaggle competition](https://www.kaggle.com/competitions/jpx-tokyo-stock-exchange-prediction/overview) \n",
    "\n",
    ">Japan Exchange Group, Inc. (JPX) is a holding company operating one of the largest stock exchanges in the world, Tokyo Stock Exchange (TSE), and derivatives exchanges Osaka Exchange (OSE) and Tokyo Commodity Exchange (TOCOM). JPX is hosting this competition and is supported by AI technology company AlpacaJapan Co.,Ltd.\n",
    "\n",
    "> In this competition, you will model real future returns of around 2,000 stocks. The competition will involve building portfolios from the stocks eligible for predictions. The stocks are ranked from highest to lowest expected returns and they are evaluated on the difference in returns between the top and bottom 200 stocks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Install necessary packages\n",
    "\n",
    "We can install the necessary package by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`. We have put the dependencies in a `requirements.txt` file so we will use the former method.\n",
    "\n",
    "> NOTE: Do not forget to use the `--user` argument. It is necessary if you want to use Kale to transform this notebook into a Kubeflow pipeline.",
    "\n",
    "After installing python packages, restart notebook kernel before proceeding.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "papermill": {
     "duration": 1.321604,
     "end_time": "2022-04-17T07:17:04.141763",
     "exception": false,
     "start_time": "2022-04-17T07:17:02.820159",
     "status": "completed"
    },
    "tags": [
     "skip"
    ]
   },
   "outputs": [],
   "source": [
    "# After installation, restart the kernel.\n",
    "!pip install -r requirements.txt --user --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Imports\n",
    "\n",
    "In this section we import the packages we need for this example. Make it a habit to gather your imports in a single place. It will make your life easier if you are going to transform this notebook into a Kubeflow pipeline using Kale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": [
     "imports"
    ]
   },
   "outputs": [],
   "source": [
    "import sys, os, subprocess\n",
    "from tqdm import tqdm\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from scipy import stats\n",
    "import matplotlib.pyplot as plt\n",
    "import zipfile\n",
    "import joblib\n",
    "\n",
    "from lightgbm import LGBMRegressor\n",
    "from sklearn.metrics import mean_squared_error\n",
    "pd.set_option('display.max_columns', 500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Project hyper-parameters\n",
    "\n",
    "In this cell, we define the different hyper-parameters. Defining them in one place makes it easier to experiment with their values and also facilitates the execution of HP Tuning experiments using Kale and Katib."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": [
     "pipeline-parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Hyper-parameters\n",
    "LR = 0.379687157316759\n",
    "N_EST = 100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "Set random seed for reproducibility and ignore warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": [
     "skip"
    ]
   },
   "outputs": [],
   "source": [
    "np.random.seed(2022)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Download and load the dataset\n",
    "\n",
    "In this section, we download the data from kaggle to get it in a ready-to-use form by the model. \n",
    "\n",
    "First, let us load and analyze the data.\n",
    "\n",
    "The data are in csv format, thus, we use the handy read_csv pandas method. There is one train data set and two test sets (one public and one private)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": [
     "block:load_data"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CompletedProcess(args=['kaggle', 'competitions', 'download', '-c', 'jpx-tokyo-stock-exchange-prediction'], returncode=0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# setup kaggle environment for data download\n",
    "dataset = \"jpx-tokyo-stock-exchange-prediction\"\n",
    "\n",
    "# setup kaggle environment for data download\n",
    "with open('/secret/kaggle-secret/password', 'r') as file:\n",
    "    kaggle_key = file.read().rstrip()\n",
    "with open('/secret/kaggle-secret/username', 'r') as file:\n",
    "    kaggle_user = file.read().rstrip()\n",
    "\n",
    "os.environ['KAGGLE_USERNAME'], os.environ['KAGGLE_KEY'] = kaggle_user, kaggle_key\n",
    "\n",
    "# download kaggle's jpx-tokyo-stock-exchange-prediction data\n",
    "subprocess.run([\"kaggle\",\"competitions\", \"download\", \"-c\", dataset])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# path to download to\n",
    "data_path = 'data'\n",
    "\n",
    "# extract jpx-tokyo-stock-exchange-prediction.zip to load_data_path\n",
    "with zipfile.ZipFile(f\"{dataset}.zip\",\"r\") as zip_ref:\n",
    "    zip_ref.extractall(data_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# read train_files/stock_prices.csv\n",
    "df_prices = pd.read_csv(f\"{data_path}/train_files/stock_prices.csv\", parse_dates=['Date'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2021-12-03 00:00:00')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_prices['Date'].max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowId</th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "      <th>Open</th>\n",
       "      <th>High</th>\n",
       "      <th>Low</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>AdjustmentFactor</th>\n",
       "      <th>ExpectedDividend</th>\n",
       "      <th>SupervisionFlag</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2332528</th>\n",
       "      <td>20211203_9993</td>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>9993</td>\n",
       "      <td>1690.0</td>\n",
       "      <td>1690.0</td>\n",
       "      <td>1645.0</td>\n",
       "      <td>1645.0</td>\n",
       "      <td>7200</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.004302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2332529</th>\n",
       "      <td>20211203_9994</td>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>9994</td>\n",
       "      <td>2388.0</td>\n",
       "      <td>2396.0</td>\n",
       "      <td>2380.0</td>\n",
       "      <td>2389.0</td>\n",
       "      <td>6500</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>0.009098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2332530</th>\n",
       "      <td>20211203_9997</td>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>9997</td>\n",
       "      <td>690.0</td>\n",
       "      <td>711.0</td>\n",
       "      <td>686.0</td>\n",
       "      <td>696.0</td>\n",
       "      <td>381100</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>0.018414</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 RowId       Date  SecuritiesCode    Open    High     Low  \\\n",
       "2332528  20211203_9993 2021-12-03            9993  1690.0  1690.0  1645.0   \n",
       "2332529  20211203_9994 2021-12-03            9994  2388.0  2396.0  2380.0   \n",
       "2332530  20211203_9997 2021-12-03            9997   690.0   711.0   686.0   \n",
       "\n",
       "          Close  Volume  AdjustmentFactor  ExpectedDividend  SupervisionFlag  \\\n",
       "2332528  1645.0    7200               1.0               NaN            False   \n",
       "2332529  2389.0    6500               1.0               NaN            False   \n",
       "2332530   696.0  381100               1.0               NaN            False   \n",
       "\n",
       "           Target  \n",
       "2332528 -0.004302  \n",
       "2332529  0.009098  \n",
       "2332530  0.018414  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_prices.tail(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2332531, 12)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lets check data dimensions\n",
    "df_prices.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 2332531 entries, 0 to 2332530\n",
      "Data columns (total 12 columns):\n",
      " #   Column            Dtype         \n",
      "---  ------            -----         \n",
      " 0   RowId             object        \n",
      " 1   Date              datetime64[ns]\n",
      " 2   SecuritiesCode    int64         \n",
      " 3   Open              float64       \n",
      " 4   High              float64       \n",
      " 5   Low               float64       \n",
      " 6   Close             float64       \n",
      " 7   Volume            int64         \n",
      " 8   AdjustmentFactor  float64       \n",
      " 9   ExpectedDividend  float64       \n",
      " 10  SupervisionFlag   bool          \n",
      " 11  Target            float64       \n",
      "dtypes: bool(1), datetime64[ns](1), float64(7), int64(2), object(1)\n",
      "memory usage: 198.0+ MB\n"
     ]
    }
   ],
   "source": [
    "df_prices.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RowId                     0\n",
       "Date                      0\n",
       "SecuritiesCode            0\n",
       "Open                   7608\n",
       "High                   7608\n",
       "Low                    7608\n",
       "Close                  7608\n",
       "Volume                    0\n",
       "AdjustmentFactor          0\n",
       "ExpectedDividend    2313666\n",
       "SupervisionFlag           0\n",
       "Target                  238\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# check total nan values per column\n",
    "df_prices.isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Transform Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": [
     "block:transform_data",
     "prev:load_data"
    ]
   },
   "outputs": [],
   "source": [
    "# sort data by 'Date' and 'SecuritiesCode'\n",
    "df_prices.sort_values(by=['Date','SecuritiesCode'], inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# sort data by 'Date' and 'SecuritiesCode'\n",
    "df_prices.sort_values(by=['Date','SecuritiesCode'], inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017-01-04</td>\n",
       "      <td>1865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2017-01-05</td>\n",
       "      <td>1865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2017-01-06</td>\n",
       "      <td>1865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2017-01-10</td>\n",
       "      <td>1865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2017-01-11</td>\n",
       "      <td>1865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1197</th>\n",
       "      <td>2021-11-29</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1198</th>\n",
       "      <td>2021-11-30</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1199</th>\n",
       "      <td>2021-12-01</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1200</th>\n",
       "      <td>2021-12-02</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1201</th>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1202 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           Date  SecuritiesCode\n",
       "0    2017-01-04            1865\n",
       "1    2017-01-05            1865\n",
       "2    2017-01-06            1865\n",
       "3    2017-01-10            1865\n",
       "4    2017-01-11            1865\n",
       "...         ...             ...\n",
       "1197 2021-11-29            2000\n",
       "1198 2021-11-30            2000\n",
       "1199 2021-12-01            2000\n",
       "1200 2021-12-02            2000\n",
       "1201 2021-12-03            2000\n",
       "\n",
       "[1202 rows x 2 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# count total trading stocks per day \n",
    "idcount = df_prices.groupby(\"Date\")[\"SecuritiesCode\"].count().reset_index()\n",
    "idcount"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAEvCAYAAACQQh9CAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAyuElEQVR4nO3deXhV1b3/8fc3CQQCCWPCkDAnIGEUgyBWRSkObZVeqK3Wn6UVS6vWVns7YG+pta0t9rbe6tUOVlCkiletFm0FihNUKzIoCgRlliQEiEmYEsi4fn+cTYwQSHJyTvY5OZ/X85wnJ2vvvc53Z5v4Ye1hmXMOEREREQmfOL8LEBEREWnrFLhEREREwkyBS0RERCTMFLhEREREwkyBS0RERCTMFLhEREREwizB7wIa07NnTzdw4EC/yxARkVD74IPA12HD/K2jlcTY7sas9evXf+ScSz25PeID18CBA1m3bp3fZYiISKhNnhz4+tprflbRamJsd2OWmX3YULtOKYqIiIiEmQKXiIiISJgpcImIiIiEWcRfw9WQqqoq8vPzOX78uN+lxIQOHTqQkZFBu3bt/C5FREQkKkVl4MrPzyc5OZmBAwdiZn6X06Y55yguLiY/P59Bgwb5XY6IiEhUispTisePH6dHjx4KW63AzOjRo4dGE0VERFogKgMXoLDVivSzFhERaZlGA5eZ9TOzV80s18w2m9l3vPbuZrbCzLZ5X7t57WZm95vZdjN7z8zG1etrprf+NjObGb7dCq+8vDwuvvhisrOzGTFiBPfddx8AJSUlTJ06laysLKZOnUppaSkAjz/+OKNHj2bUqFFMmjSJd999t66vG264gbS0NEaOHHnGz1y2bBnDhg0jMzOTefPm1bU/8MADZGZmYmZ89NFHp91+165dTJgwgczMTL70pS9RWVkJwKpVqxg3bhwJCQk888wzQf9MRERE5PSaMsJVDfyncy4bmAjcYmbZwBzgZedcFvCy9z3AFUCW95oN/AECAQ24E5gAnAvceSKkRZuEhAR++9vfkpuby+rVq3nwwQfJzc1l3rx5TJkyhW3btjFlypS6YDRo0CBWrlzJxo0bmTt3LrNnz67r66tf/SrLli074+fV1NRwyy23sHTpUnJzc1m8eDG5ubkAnH/++bz00ksMGDDgjH388Ic/5Pbbb2f79u1069aN+fPnA9C/f38effRRvvzlL7fkRyIiIiJn0OhF8865QqDQe3/EzLYA6cA0YLK32kLgNeCHXvtjzjkHrDazrmbWx1t3hXOuBMDMVgCXA4tDuD+tok+fPvTp0weA5ORkhg8fTkFBAUuWLOE17xHCM2fOZPLkydxzzz1MmjSpbtuJEyeSn59f9/2FF17I7t27z/h5a9asITMzk8GDBwNwzTXXsGTJErKzszn77LMbrdc5xyuvvMITTzxRV9tPf/pTbrrpJk5MmxQXF7Vnl0VEItrLW/Zz4EgFB46kAbB4zQGfK4pN8WZ8cXw/3z6/WXcpmtlA4GzgLaCXF8YA9gG9vPfpQF69zfK9ttO1N/Q5swmMjtG/f//mlNjqdu/ezTvvvMOECRPYv39/XRDr3bs3+/fvP2X9+fPnc8UVVzTrMwoKCujX7+P/SDIyMnjrrbeavH1xcTFdu3YlISGhbvuCgoJm1SAiIs1XcPAYsxYGpqfbVzQRgDue3ehnSTGrQ7u46AhcZtYZ+Ctwm3PucP0LqZ1zzsxcqIpyzj0EPASQk5Nzxn5vuw02bAjVJweMHQu/+13j6x09epQZM2bwu9/9jpSUlE8sM7NTLjZ/9dVXmT9/Pq+//nroihURkYi1bf8RAP50/Tnc9XrgKppn75jiZ0kxy+/7v5oUuMysHYGw9bhz7lmveb+Z9XHOFXqnDE+MkRYA9SNkhtdWwMenIE+0vxZ86f6qqqpixowZXHfddUyfPh2AXr16UVhYSJ8+fSgsLCQtLa1u/ffee48bb7yRpUuX0qNHjzP2nZeXx5VXXgnAN7/5TcaMGUNe3seDg/n5+aSnNzg4WOeyyy5j//795OTk8Oc//5mDBw9SXV1NQkJCk7YXEZGWKzh4DIAxGV1pnxC4dKN3lw5+liQ+aTRwWWCYZj6wxTl3b71FzwMzgXne1yX12r9lZk8SuED+kBfKlgO/rHeh/KXAHS3dgaaMRIWac45Zs2YxfPhwvvvd79a1X3XVVSxcuJA5c+awcOFCpk2bBsCePXuYPn06ixYtYujQoY32369fPzbUG7arrq5m27Zt7Nq1i/T0dJ588sm667FOZ/ny5Z/4/uKLL+aZZ57hmmuu+URtIiISPqVlgTvCu3XSTB2xrilXSp8PXA9cYmYbvNdnCAStqWa2Dfi09z3Ai8BOYDvwZ+BmAO9i+Z8Da73Xz05cQB9t3njjDRYtWsQrr7zC2LFjGTt2LC+++CJz5sxhxYoVZGVl8dJLLzFnTuDGzZ/97GcUFxdz8803M3bsWHJycur6uvbaaznvvPP44IMPyMjIqLt7sL6EhAQeeOABLrvsMoYPH84Xv/hFRowYAcD9999PRkYG+fn5jB49mhtvvLHBmu+55x7uvfdeMjMzKS4uZtasWQCsXbuWjIwMnn76ab7xjW/U9SsiIi1XUlZF58QEEhPi/S5FfGaBmwkjV05Ojlu3bt0n2rZs2cLw4cN9qig26WcuIiE3eXLgq3d3d1t0yxNv827eQV7/4SWxsLsCmNl651zOye16FoCIiEgYFB+t4OUt+5k05MzX7UpsUOASEREJg2fW53O8qpbZFw72uxSJAApcIiIiYZBbeJj0rh3JTEv2uxSJAApcIiIiYbCzqIwhaZ39LkMihAKXiIhIiDnn2FF0lME9O/ldikQIBS4REZEQKymrpLyyhgE9kvwuRSKEAlcQ8vLyuPjii8nOzmbEiBHcd999AJSUlDB16lSysrKYOnUqpaWlADz++OOMHj2aUaNGMWnSJN599926vpYtW8awYcPIzMxk3rx5DX4ewMKFC8nKyiIrK4uFCxfWtVdWVjJ79myGDh3KWWedxV//+tcGt1+/fj2jRo0iMzOTb3/725x4HMjTTz/NiBEjiIuL4+THb4iISHCOVlQDkNxBDzyVAAWuICQkJPDb3/6W3NxcVq9ezYMPPkhubi7z5s1jypQpbNu2jSlTptQFqEGDBrFy5Uo2btzI3LlzmT17NgA1NTXccsstLF26lNzcXBYvXkxubu4pn1dSUsJdd93FW2+9xZo1a7jrrrvqwtzdd99NWloaW7duJTc3l4suuqjBmm+66Sb+/Oc/s23bNrZt28ayZcsAGDlyJM8++ywXXnhhOH5UIiIx6UTg6pzY5CmLpY1T4ApCnz59GDduHADJyckMHz6cgoIClixZwsyZMwGYOXMmf/vb3wCYNGkS3boFZjSaOHEi+fn5AKxZs4bMzEwGDx5M+/btueaaa1iyZMkpn7d8+XKmTp1K9+7d6datG1OnTq0LTAsWLOCOOwIzJMXFxdGzZ89Tti8sLOTw4cNMnDgRM+MrX/lKXW3Dhw9n2LBhofvhiIgIZRU1gAKXfEyBq4V2797NO++8w4QJE9i/fz99+vQBoHfv3uzfv/+U9efPn88VV1wBQEFBAf36fTzPd0ZGBgUFBadsc7r1Dh48CMDcuXMZN24cV199dYOfWVBQQEZGRqOfIyIioVHmjXB1StSUPhIQ/dH7ttug3kTPITF2bJNmxT569CgzZszgd7/7HSkpKZ9YZmYE5v3+2Kuvvsr8+fN5/fXXQ1JmdXU1+fn5TJo0iXvvvZd7772X733veyxatCgk/YuICHztkTVsyDvYrG0qq2sBSO4Q/f+bldDQfwlBqqqqYsaMGVx33XVMnz4dgF69elFYWEifPn0oLCwkLS2tbv333nuPG2+8kaVLl9KjR2Cah/T0dPLy8urWyc/PJz09nbfeeotvfOMbQGDi6/T0dF6rN/lWfn4+kydPpkePHiQlJdV9/tVXX838+fOpqanhnHPOAeCqq67ipptuqjuNWf9zRETkY8VHK6g5aX7hiqpaXv2giJwB3cjum3KaLRvWLak9g3vqOVwSEP2BqwkjUaHmnGPWrFkMHz6c7373u3XtV111FQsXLmTOnDksXLiQadOmAbBnzx6mT5/OokWLGDp0aN3648ePZ9u2bezatYv09HSefPJJnnjiCUaMGMGGeqN2JSUl/OhHP6q7UP6f//wnv/rVrzAzrrzySl577TUuueQSXn75ZbKzs4mPj//E9gApKSmsXr2aCRMm8Nhjj3HrrbeG7wckIhJlFq3+kLl/23Ta5TdeMJjLR/ZuxYqkrYn+wOWDN954g0WLFjFq1CjGjh0LwC9/+UvmzJnDF7/4RebPn8+AAQN46qmngMAoVXFxMTfffDMQuMtx3bp1JCQk8MADD3DZZZdRU1PDDTfcwIgRI075vO7duzN37lzGjx8PwE9+8hO6d+8OwD333MP111/PbbfdRmpqKo888kiDNf/+97/nq1/9KseOHeOKK66ou47sueee49Zbb6WoqIjPfvazjB07luXLl4f05yUiEsmqamr542s7GJmewjXj+5+yvGO7eC45K62BLUWaztxJw6eRJicnx538fKgtW7YwfPhwnyqKTfqZi0jITZ4c+Frvkgk/rNxaxMwFa3jo+nO4dET4RrEiZHclzMxsvXMu5+R23aUoIiIxbdv+IwCMH9jd50qkLVPgEhGRmLanpJyUDgl069Te71KkDVPgEhGRmFZSVknPzol+lyFtXNQGrki/9qwt0c9aRNqy0vJKuiZpzkMJr6gMXB06dKC4uFhBoBU45yguLqZDhw5+lyIiElJFRyq4fv5bvLPnIN11OlHCLCofC5GRkUF+fj5FRUV+lxITOnTo8ImpgURE2oJ38w7yr20fcXb/rswYp79xEl5RGbjatWvHoEGD/C5DRESiWEl5JQD3X3M2/bon+VyNtHVReUpRRESkpUrLAoFLdydKa1DgEhGRmFRcVkn7+Dg6tY/3uxSJAVF5SlFERCRY5ZXVrNtdyoY9B+nfIwkz87skiQEKXCIiElP+8NoO/veV7QBcNaavz9VIrFDgEhGRNqW0rJLn3imgprbhRwe9tOUAA3skce+XxjK0V3IrVyexqtHAZWYLgM8BB5xzI722McAfgc7AbuA659xhM2sHPAyM8/p+zDn3K2+by4H7gHjgYefcvNDvjoiIxLon1uzhv5d/cMZ1vjyhP+P6d2ulikSaNsL1KPAA8Fi9toeB7znnVprZDcD3gbnA1UCic26UmSUBuWa2GMgDHgSmAvnAWjN73jmXG7pdERGRWHP4eBX/vewDjlXV1LW9vaeU3ikdeOk/LzrtdrpQXlpbo4HLObfKzAae1DwUWOW9XwEsJxC4HNDJzBKAjkAlcBg4F9junNsJYGZPAtMABS4REQnaI6/vZtHqD0nv2vET7dPG9qVzoq6akcgR7H+NmwkEpr8RGNXq57U/47UXAknA7c65EjNLJzDKdUI+MCHIzxYREaGiuoZH/72LTw9P4+GZ4/0uR+SMgn0O1w3AzWa2HkgmMJIFgZGsGqAvMAj4TzMb3NzOzWy2ma0zs3WavkdERBqy79BxSsuruGxEb79LEWlUUIHLOfe+c+5S59w5wGJgh7foy8Ay51yVc+4A8AaQAxTw8SgYQIbXdrr+H3LO5TjnclJTU4MpUURE2rgS70nxPTsn+lyJSOOCClxmluZ9jQN+TOCORYA9wCXesk7AROB9YC2QZWaDzKw9cA3wfMtKFxGRWFJb68grKa97bdt/FICuSe18rkykcU15LMRiYDLQ08zygTuBzmZ2i7fKs8Aj3vsHgUfMbDNgwCPOufe8fr5F4OL6eGCBc25zKHdERETatnnL3uehVTtPaU9N1giXRL6m3KV47WkW3dfAukcJXETfUD8vAi82qzoRERHPro/KSO/akdunDq1r69G5PRndknysSqRpdM+siIhEhdKySgb0SOIL52T4XYpIswV7l6KIiEireeHdvewuLqNbp/Z+lyISFI1wiYhIRCstq+TWxe8AMKJvis/ViARHgUtERCLS4jV7WLOrhEPHqgB46PpzuFTP3JIopcAlIiIRxznHvKXvU+sc3ZLaMzI9hfEDu/tdlkjQFLhERKRZXtxYyIOvbse5lvUzL/8QAHPu+9cpyxxw6FgVcz+XzaxPDWrZB4lEAAUuERFplr+9U0BeSTnnDurRon4SEwL3bfU9aeLpEwb37MSl2b1a9BkikUKBS0REmsw5x5Z9hzlvSA/+dH1Oyzp7JBmAh2e2sB+RKKDHQoiISJP9/b1C8kqOcWm2Ll4XaQ4FLhERaZKqmlq++9QGuia146qxff0uRySq6JSiiIic1oHDx3l7z0EAPjpaQVWN4ysTB9AuXv9eF2kOBS4RETmtuUs2sXzz/k+0fWZ0H5+qEYleClwiItKg/NJylm/ez0VDU/nh5WcB0Dkxgf49NFm0SHMpcImISIMe/tcuAD6d3YtsTakj0iI6CS8iIg3aduAII9NTuH7iAL9LEYl6ClwiInKKfYeOs2ZXiabTEQkRnVIUEZFPmLf0fZZsKKCm1nHD+ZpWRyQUNMIlIiJ1dhYd5U+rdtCzcyL/9dls+nXXBfIioaARLhGRGLTg9V38/B+5DU5A3T4hjgVfHU9qcmLrFybSRilwiYi0AcVHKygpq2zy+ity95OWnMiXxvc/Zdno9C4KWyIhpsAlIhLlKqpruOi/X+NoRXWztrtqTF++O3VomKoSkfoUuEREoljBwWP8c/M+jlZUM+tTgzi7f9cmb3vuIN2BKNJaFLhERKLYHc9uZNXWIgCuPbc/mWmdfa5IRBqiwCUiEsW27jvCZSN6ceeVI+jbtaPf5YjIaeixECIiUaqyupZ9h4+T3aeLwpZIhFPgEhGJUgfLA3cldu/c3udKRKQxClwiIlGq5ETgSlLgEol0ClwiIlHqxHO3unVq53MlItKYRgOXmS0wswNmtqle2xgze9PMNprZC2aWUm/ZaG/ZZm95B6/9HO/77WZ2v5lZeHZJRCQ2HCyvAqB7J41wiUS6poxwPQpcflLbw8Ac59wo4Dng+wBmlgD8Bfimc24EMBmo8rb5A/B1IMt7ndyniIg0Q90Il04pikS8RgOXc24VUHJS81Bglfd+BTDDe38p8J5z7l1v22LnXI2Z9QFSnHOrnXMOeAz4fAjqFxGJWaVe4OqapFOKIpEu2Gu4NgPTvPdXA/2890MBZ2bLzextM/uB154O5NfbPt9rExGRIO0/cpzkxAQSE+L9LkVEGhFs4LoBuNnM1gPJwIkZUxOATwHXeV//w8ymNLdzM5ttZuvMbF1RUVGQJYqItG0b8g4yMr2L32WISBMEFbicc+875y51zp0DLAZ2eIvygVXOuY+cc+XAi8A4oADIqNdFhtd2uv4fcs7lOOdyUlNTgylRRKRNO1pRTe7ew4wf2M3vUkSkCYIKXGaW5n2NA34M/NFbtBwYZWZJ3gX0FwG5zrlC4LCZTfTuTvwKsKTF1YuIxKj/W5tHrYNzBmoCapFo0JTHQiwG3gSGmVm+mc0CrjWzrcD7wF7gEQDnXClwL7AW2AC87Zz7h9fVzQTubtxOYERsaWh3RUQkdqzI3QfAOQM0wiUSDRqdvNo5d+1pFt13mvX/QuDRECe3rwNGNqs6ERFp0I6iMr5wTgadExv9My4iEUBPmhcRiTKHj1dRdKSCIamd/S5FRJpIgUtEJMrsLCoDYEhqJ58rEZGm0li0iEiEqKqppabWNbre1n1HABisES6RqKHAJSISATYVHOI/fv8GVTWNBy6AdvFG/+5JYa5KREJFgUtEJAK8vOUAVTWOb1+SSYf2jT85PjO1M+0TdFWISLRQ4BIRiQCLVu8mIc64fepQAo8rFJG2RP88EhHxWWlZJR8dreSzo/sobIm0UQpcIiI++91LWwH43Oi+PlciIuGiwCUi4qND5VU8vT6fqdm9mJrdy+9yRCRMFLhERHz05s6PKK+s4RsXDva7FBEJI100LyLSihat/pCXt+yv+z6/9BgAZ/VJ8askEWkFClwiIq3oTyt3cKyyhoxuHQHo1D6ea8/tpzkRRdo4/YaLiLSS41U1FBw8xnemZHHbp4f6XY6ItCIFLhGRMKqqqaWiuhaArfuP4ByadFokBilwiYiEyf7Dx5n2wBvsO3z8E+2ZaQpcIrFGgUtEJEx+9vdcSsor+eHlZ5EQF3igadekdpzVO9nnykSktSlwiYiEwc6io/zjvUK+MyWLmyYP8bscEfGZApeISAvV1Dr+/t5ejlXW1LW9v+8IAJ8eroeZiogCl4hIiz21Lo87nt14SntyYgJD0jr5UJGIRBoFLhGRJnrkjV3klRw7pX3ZpkJGpXfhoa+c84n25A7tSGqvP7MiosAlItIkB44c564XcklMiKN9/CdnRYuPN+6eOpQ+XTr6VJ2IRDoFLhGR0yirqOYHf32PQ+VVHKmoBuDhmTlckJXqc2UiEm00ebWIyGk88dYe/vFeIWWV1STEGRcPS2Vsv65+lyUiUUgjXCIi9eSVlHP9/Lcor6zhYHkV5w3uweLZE/0uS0SinAKXiMSsY5U1VHrT7pywcmsRu4vLmTa2L50SE/jKeQN8qk5E2hIFLhGJSXkl5Uz57Uoqa2pPWdY+Po7fXD2GdvG66kJEQkOBS0RiSmV1LRsLDvLmjmIqa2q5efIQenZO/MQ6Q9I6K2yJSEg1GrjMbAHwOeCAc26k1zYG+CPQGdgNXOecO1xvm/5ALvBT59xvvLbLgfuAeOBh59y80O6KiEjjFq3+kJ//PReAxIQ4br0ki47t432uSkTauqb8E+5R4PKT2h4G5jjnRgHPAd8/afm9wNIT35hZPPAgcAWQDVxrZtlB1iwiErQthYfp2bk9j984gX98+1MKWyLSKhoNXM65VUDJSc1DgVXe+xXAjBMLzOzzwC5gc731zwW2O+d2OucqgSeBacGXLSISnB1FR8lKS+b8zJ5kpiX7XY6IxIhgL1LYzMeB6WqgH4CZdQZ+CNx10vrpQF697/O9NhGRVpFfWs5dL2zmg31HGJyq+Q1FpHUFG7huAG42s/VAMlDptf8U+B/n3NGWFGVms81snZmtKyoqaklXIiIAPL0un0fe2E1S+3guGqonxYtI6wrqLkXn3PvApQBmNhT4rLdoAvAFM/s10BWoNbPjwHq8UTBPBlBwhv4fAh4CyMnJccHUKCLinONHz21k10dl7Cwqo3/3JFb94GK/yxKRGBRU4DKzNOfcATOLA35M4I5FnHMX1Fvnp8BR59wDZpYAZJnZIAJB6xrgyy0tXkTkTF794ACL1+QxMj2FgT07cdmI3n6XJCIxqimPhVgMTAZ6mlk+cCfQ2cxu8VZ5FnjkTH0456rN7FvAcgKPhVjgnNt8pm1EJHbd8OhaNuQdbHE/ZRXV9O3SgeduPl/P1RIRXzUauJxz155m0X2NbPfTk75/EXixyZWJSMw5WlHNwfJKXnn/ADkDujG8T0qL+7xiZG+FLRHxnZ40LyIRIa+knEt++xpVNYHLNm+8YDCXj9QpQBFpGxS4RCQibCw4RFWN49uXZNKvexKXnJXmd0kiIiGjwCUiEWHHgcDTZL45eQhJ7fWnSUTaFl3YICK+yysp5w8rd9C3SweFLRFpkxS4RMR3/7NiK+WVNYwb0M3vUkREwkKBS0R8VVVTy7PvFDC2X1fuv+Zsv8sREQkLBS4R8dWJ522N69+NuDjztxgRkTDRxRIi4ouN+Yf4xT9yKTpaAcDXzh/ob0EiImGkwCUivli6qZB1H5YycXB3cgZ0I71rR79LEhEJGwUuEWl1T63N4/ev7WBwaicev3Gi3+WIiISdruESkVb3yvsHAPjxZ4f7XImISOtQ4BKRVrej6CiXZvfikrN6+V2KiEir0ClFEWmyjfmHKC2vbFEfDthdXMaU4QpbIhI7FLhEpEmeWZ/P955+N2T9ZfdNCVlfIiKRToFLRBpVWlbJ3f/IZVz/rvxXCK67ahcfx8i+XUJQmYhIdFDgEpFGvbb1AKXlVcz/XDbj+mv6HRGR5tJF8yLSqMPHqgEY0D3J50pERKKTApeINOpoRSBwdUrUoLiISDAUuESkUWUV1STEGYkJ+pMhIhIM/fUUkUaVVVTTKTEBM00uLSISDAUuEWnUwjc/pGO7eL/LEBGJWgpcInJGzjnMIKObJpcWEQmWApeInNGRimqcg8tG9Pa7FBGRqKVbjkTktA6WV/LP3P0AdOvU3udqRESilwKXiJzWL1/cwlPr8gGdUhQRaQkFLhE5rQ/2HeHs/l25Z8ZostI6+12OiEjU0jVcItIg5xw7isoYnd6Fob2S9UgIEZEW0AiXiNRZvGYPa3eVAFBV6zhaUc3gVI1siYi0VKMjXGa2wMwOmNmmem1jzOxNM9toZi+YWYrXPtXM1nvt683sknrbnOO1bzez+03/XBaJKM457ln2Piu27GfthyVsyCslM60z52f28Ls0EZGo15QRrkeBB4DH6rU9DHzPObfSzG4Avg/MBT4CrnTO7TWzkcByIN3b5g/A14G3gBeBy4GlodgJEWnYkeNVfPMv6zl0rKrRdZ2Dg+VVzP1cNrM+NagVqhMRiR2NBi7n3CozG3hS81Bglfd+BYFgNdc59069dTYDHc0sEegOpDjnVgOY2WPA51HgEgmrd/MO8cb2YsYP7EZKh3aNrt+/exKXZvdqhcpERGJLsNdwbQamAX8Drgb6NbDODOBt51yFmaUD+fWW5fPxyJeIhMCOoqNM//2/OVZZU9dW4xwAD355HGkpHfwqTUQk5gUbuG4A7jezucDzQGX9hWY2ArgHuDSYzs1sNjAboH///kGWKBJb1u8u5dCxKmaeN4CkxI9/tdO7dlTYEhHxWVCByzn3Pl6YMrOhwGdPLDOzDOA54CvOuR1ecwGQUa+LDK/tdP0/BDwEkJOT44KpUSTW7PjoKO3j4/jJlSOIj9M9KSIikSSo53CZWZr3NQ74MfBH7/uuwD+AOc65N06s75wrBA6b2UTv7sSvAEtaVrqI1LfjQBkDeyYpbImIRKCmPBZiMfAmMMzM8s1sFnCtmW0F3gf2Ao94q38LyAR+YmYbvFeat+xmAnc3bgd2oAvmRUJqZ9FRhuiZWSIiEakpdylee5pF9zWw7i+AX5ymn3XAyGZVJyJNUlVTy56Scq4Y1dvvUkREpAGa2kekDfiwuJzqWsfgnhrhEhGJRApcIm3AzqKjAAzRBNMiIhFJgUukDdhRVAbA4NROPlciIiIN0eTVIlGssrqWGx9bx8b8g6QmJzbpafIiItL6FLhEoti2A0dYtbWIcf278h/jMhrfQEREfKHAJRLFdnqnEu/+j1EM75PiczUiInI6ClwiPqqtdewuLqPWBTehwtt7SjGDQT117ZaISCRT4BLx0SP/3s3P/57boj4G9kiiQ7v4EFUkIiLhoMAl4qO/vVNAz86J3HlldtB9nNU7OYQViYhIOChwifjkg31H2FhwiAuyenLlmL5+lyMiImGkwCXSit7eU8rKD4oAeHNHMfFxxq+/MNrnqkREJNwUuERa0S/+nsvbew7WfX/rJZn06dLRv4JERKRVKHCJhNlfVn/Iv3d8BEBu4WH+38T+/OLzo3yuSkREWpMCl0iY/c+KrdQ4R2rnRAb26MRlI3r7XZKIiLQyBS6RMFq9s5jiskp+9JmzmH3hEL/LERERn2jyapEw+vf2wKnEqdka1RIRiWUKXCJhtOOjMgb0SNKT4EVEYpwCl0iYHDlexb+2FjFYYUtEJOYpcImEycwFazh8vJqhehK8iEjMU+ASCYOaWsemvYcZ1iuZWy7O9LscERHxmQKXSBjsPXiMyupavnr+QFI6tPO7HBER8ZkCl0gYvLixEIAhqZ19rkRERCKBApdIGLzuPQ7irD66fktERPTgU5EWcc7x879vIa+0/BPt7+w5yFVj+up0ooiIAApcIi2y99BxFryxi/SuHUnp+HG46t89iSvH9PWxMhERiSQKXCInWbxmD39auaNJ61ZU1wLwm6vHcN6QHuEsS0REopgCl8Q05xyVNbWfaHt+w16OVtRwfmbTAlRKh3ac3b9rGKoTEZG2QoFLYtq3n9zAC+/uPaV9+rh07v3i2NYvSERE2qRGA5eZLQA+Bxxwzo302sYAfwQ6A7uB65xzh71ldwCzgBrg28655V775cB9QDzwsHNuXsj3RqSJamsdL24q5IV393L5iN6MyuhSt8wMPjOyj4/ViYhIW9OUEa5HgQeAx+q1PQx8zzm30sxuAL4PzDWzbOAaYATQF3jJzIZ62zwITAXygbVm9rxzLjc0uyHSPM+sz+cHf32P9glx/HL6KLp3au93SSIi0oY1+hwu59wqoOSk5qHAKu/9CmCG934a8KRzrsI5twvYDpzrvbY753Y65yqBJ711RVpdSVkl97+yDYAXvvUphS0REQm7YB98upmPA9PVQD/vfTqQV2+9fK/tdO0NMrPZZrbOzNYVFRUFWaJIw/64cgf5pcc4b3APhmliaRERaQXBBq4bgJvNbD2QDFSGriRwzj3knMtxzuWkpqaGsmsRDhw+TvuEOB752ni/SxERkRgR1F2Kzrn3gUsBvGu0PustKuDj0S6ADK+NM7SLtKqS8iqG906mQ7t4v0sREZEYEdQIl5mleV/jgB8TuGMR4HngGjNLNLNBQBawBlgLZJnZIDNrT+DC+udbWrxIMErLKumm67ZERKQVNRq4zGwx8CYwzMzyzWwWcK2ZbQXeB/YCjwA45zYDTwG5wDLgFudcjXOuGvgWsBzYAjzlrSvSqsoqqvlg/xEG9+zsdykiIhJDGj2l6Jy79jSL7jvN+ncDdzfQ/iLwYrOqEwmhpRsLueuFXCqra/l0dprf5YiISAzRk+Yl4hyvqqHWuZD3u3TTPsoqqvnGRYOZMEjzHoqISOtR4JKI8sK7e7l18Tth6/+CrJ7cccXwsPUvIiLSEAUuiSivvH+Abknt+OZFQ8LS/0XD9JgRERFpfQpcElHW7i5h4uAefCNMgUtERMQPwT74VCTkNhUcIr/0GDkDu/tdioiISEgpcEnEeGpdYPanT2X29LkSERGR0FLgkohRWl5FWnKi5jcUEZE2R4FLIkZpWSUZ3Tr6XYaIiEjIKXBJxCgpq6S7ptwREZE2SIFLIsbB8kq6JilwiYhI26PAJRGjpFwjXCIi0jYpcElEKDpSwfGqWrpphEtERNogPfhUfPfwv3byi39sAaBHZwUuERFpexS4xHfrdpfSKyWRb12SxWdG9fG7HBERkZBT4BLf7Sg6yuiMrlw/cYDfpYiIiISFruESX1XX1PJhcTlDUjv7XYqIiEjYKHCJr/JLj1FZU8vg1E5+lyIiIhI2OqUoraroSAV/WrmD6loHQOGhYwAa4RIRkTZNgUta1f++so1Fqz8kpUO7urYhqZ04S/MniohIG6bAJSHzt3cK+L+1eWdcZ/2eUq4+J4Nff2FMK1UlIiLiP13DJSGzaPWH5BYepqbWnfZ17sDufOviLL9LFRERaVUa4ZJm27z3ELMfW09Fdc0n2kvKKvnS+P78avoonyoTERGJTApc0izHq2p4ecsBCg4e49pz+xFnVrcszozrJvb3sToREZHIpMAlTVZVU8v5816huKySnp3b86vpo/0uSUREJCoocEmTlZZVUlxWyVVj+nL9eXoqvIiISFPponlpspLySgAuH9mb8QO7+1yNiIhI9FDgkiYrKQsErm5J7X2uREREJLo0KXCZ2QIzO2Bmm+q1jTWz1Wa2wczWmdm5XnsXM3vBzN41s81m9rV628w0s23ea2bod0fC6cDhCgBSkxW4REREmqOpI1yPApef1PZr4C7n3FjgJ973ALcAuc65McBk4Ldm1t7MugN3AhOAc4E7zaxbi6qXVvXEmj0A9Oue5HMlIiIi0aVJgcs5twooObkZSPHedwH21mtPNjMDOnvbVQOXASuccyXOuVJgBaeGOIlQzjk2Fxyie6f2JCbE+12OiIhIVGnJXYq3AcvN7DcEgtskr/0B4HkCASwZ+JJzrtbM0oH6877kA+kt+HxpJc45vvmX9ZRV1vCDy8/yuxwREZGo05KL5m8CbnfO9QNuB+Z77ZcBG4C+wFjgATNLaaiD0zGz2d51YeuKiopaUKKEwoEjFSzfvJ+EOOOSs9L8LkdERCTqtGSEaybwHe/908DD3vuvAfOccw7Ybma7gLOAAgLXdJ2QAbzWUMfOuYeAhwBycnJcC2qUZnjh3b38ZMkmak/6idd4DY9+7VxdvyUiIhKElgSuvcBFBELTJcA2r30PMAX4l5n1AoYBO4HtwC/rXSh/KXBHCz5fmuBoRTXlFdVNWnfZ5n1U1zpmjMs4ZVnnxATGD9I9DiIiIsFoUuAys8UERqd6mlk+gbsNvw7cZ2YJwHFgtrf6z4FHzWwjYMAPnXMfef38HFjrrfcz59zJF+JLCJWWVTJp3iscq6ppfGXPBVk9+elVI8JYlYiISOxpUuByzl17mkXnNLDuXgKjVw31swBY0OTq2oADh4/zwf4jvnz21v1HOVZVw9cvGMTAnp2atM15g3uEuSoREZHYo7kUw+zbT77D6p3+DeTFGXz9gsGkpXTwrQYREZFYp8AVZoWHjnNBVk++MyXLl8/v1qm9wpaIiIjPFLjCrKSskouHpZGjyZ5FRERiliavDqOqmlqOHK/WZM8iIiIxToErjPYdOg5AWkqiz5WIiIiInxS4wuixN3cDMCS1s7+FiIiIiK8UuMLkeFUNz75dQGpyIqMzuvhdjoiIiPhIF82HyfMb9lJcVsn/zZ5Ih3bxfpcjIiIiPtIIV5i8ubOYrkntOHeQ7k4UERGJdQpcYVBRXcNz7xTQt0tHzMzvckRERMRnClxhUFJWCQTmJRQRERFR4AqDE4Hr7P7dfK5EREREIoECVxiUllUB0L2THngqIiIiClxhsf3AEQD6dNEchiIiIqLAFRYvbTnAkNRO9Oue5HcpIiIiEgEUuELs0LEqVu8sZmp2b79LERERkQihwBViK7cWUV3rmJrdy+9SREREJEIocIXYe3kHSUyIY2y/rn6XIiIiIhFCU/uEwLb9R7j6T29yrLKGqppahvVOIT5ODzwVERGRAAWuIFXV1LL34DEA/pm7n4PlVcw8bwAd2sdzYVaqz9WJiIhIJFHgCtJ/PbeRp9bl133fOTGBO68cQZxGtkREROQkMR+43txRzNGK6mZv99auEsZkdGHmpIEADOrZSWFLREREGhTzgetnf89lS+HhoLadNiWL6eMyQlyRiIiItDUxH7j+99qxHK+qbfZ2ZjC0V3IYKhIREZG2JuYDV2aaQpOIiIiEl57DJSIiIhJmClwiIiIiYabAJSIiIhJmjQYuM1tgZgfMbFO9trFmttrMNpjZOjM7t96yyV77ZjNbWa/9cjP7wMy2m9mc0O+KiIiISGRqygjXo8DlJ7X9GrjLOTcW+In3PWbWFfg9cJVzbgRwtdceDzwIXAFkA9eaWXbLyxcRERGJfI0GLufcKqDk5GYgxXvfBdjrvf8y8Kxzbo+37QGv/Vxgu3Nup3OuEngSmNbC2kVERESiQrCPhbgNWG5mvyEQ2iZ57UOBdmb2GpAM3OecewxIB/LqbZ8PTAjys0VERESiSrCB6ybgdufcX83si8B84NNef+cAU4COwJtmtrq5nZvZbGA2QP/+/YMsUURERCQyBHuX4kzgWe/90wROGUJg5Gq5c67MOfcRsAoYAxQA/eptn+G1Ncg595BzLsc5l5OamhpkiSIiIiKRIdjAtRe4yHt/CbDNe78E+JSZJZhZEoHThluAtUCWmQ0ys/bANcDzwZctIiIiEj0aPaVoZouByUBPM8sH7gS+DtxnZgnAcbzTf865LWa2DHgPqAUeds5t8vr5FrAciAcWOOc2N6XA9evXf2RmHzZ3x5qhJ/BRGPuX4OnYRDYdn8gVXcfGzO8KWlNPsyg6NrElVL83AxpqNOdcCPqOXma2zjmX43cdciodm8im4xO5dGwil45N5Ar3sdGT5kVERETCTIFLREREJMwUuOAhvwuQ09KxiWw6PpFLxyZy6dhErrAem5i/hktEREQk3DTCJSIiIhJmbS5wmVk/M3vVzHLNbLOZfcdr725mK8xsm/e1m9d+lpm9aWYVZva9ev0MM7MN9V6Hzew2n3arTQjVsfGW3e71scnMFptZBz/2qS0J8fH5jndsNuv3puWCODbXmdl7ZrbRzP5tZmPq9XW5mX1gZtvNbI5f+9RWhPjYLDCzA2a2ya/9aUtCdWxO10+zOefa1AvoA4zz3icDW4Fs4NfAHK99DnCP9z4NGA/cDXzvNH3GA/uAAX7vXzS/QnVsCMzNuQvo6H3/FPBVv/cv2l8hPD4jgU1AEoFn/b0EZPq9f9H8CuLYTAK6ee+vAN7y3scDO4DBQHvgXSDb7/2L5leojo33/YXAOGCT3/vVFl4h/L1psJ/m1tPmRricc4XOube990cIPOk+HZgGLPRWWwh83lvngHNuLVB1hm6nADucc+F8AGubF+JjkwB09B6+m0Rg9gNpgRAen+EE/lCVO+eqgZXA9PDvQdsVxLH5t3Ou1GtfTWA6NQhMw7bdObfTOVcJPOn1IUEK4bHBObcKKGmdytu+UB2bM/TTLG0ucNVnZgOBs4G3gF7OuUJv0T6gVzO6ugZYHNrqYltLjo1zrgD4DbAHKAQOOef+Gb5qY08Lf3c2AReYWQ8LTPH1GT45l6q0QBDHZhaw1HufDuTVW5ZPEP/jkIa18NhIGIXq2JzUT7M0OrVPtDKzzsBfgducc4et3tQRzjlnZk26PdMCcz9eBdwRlkJjUEuPjXe+fRowCDgIPG1m/88595fwVR07Wnp8XGCKr3uAfwJlwAagJnwVx47mHhszu5jA/zg+1aqFxiAdm8gVqmNzcj/NraNNjnCZWTsCP5THnXPPes37zayPt7wPcKCJ3V0BvO2c2x/6SmNPiI7Np4Fdzrki51wV8CyBc+/SQqH63XHOzXfOneOcuxAoJXDNg7RAc4+NmY0GHgamOeeKveYCPjnamOG1SQuE6NhIGITq2Jymn2Zpc4HLAtF1PrDFOXdvvUXPAzO99zOBJU3s8lp0OjEkQnhs9gATzSzJ63MKgXPq0gKh/N0xszTva38C1289EdpqY0tzj433c38WuN45Vz/srgWyzGyQN3p/jdeHBCmEx0ZCLFTH5gz9NE9zr7KP9BeBIUAHvEfgVMYGAteQ9ABeBrYRuGuqu7d+bwLXMRwmcHoqH0jxlnUCioEufu9XW3iF+NjcBbxP4HqhRUCi3/sX7a8QH59/AbkE7oKb4ve+RfsriGPzMIGRxRPrrqvX12cIjDjuAP7L732L9leIj81iAtelVnm/T7P83r9ofoXq2Jyun+bWoyfNi4iIiIRZmzulKCIiIhJpFLhEREREwkyBS0RERCTMFLhEREREwkyBS0RERCTMFLhEREREwkyBS0RERCTMFLhEREREwuz/A3o3k/vsMNiIAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 720x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(10, 5))\n",
    "plt.plot(idcount[\"Date\"],idcount[\"SecuritiesCode\"])\n",
    "plt.axvline(x=['2021-01-01'], color='blue', label='2021-01-01')\n",
    "plt.axvline(x=['2020-06-01'], color='red', label='2020-06-01')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>970</th>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>971</th>\n",
       "      <td>2020-12-24</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>972</th>\n",
       "      <td>2020-12-25</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>973</th>\n",
       "      <td>2020-12-28</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>974</th>\n",
       "      <td>2020-12-29</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1197</th>\n",
       "      <td>2021-11-29</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1198</th>\n",
       "      <td>2021-11-30</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1199</th>\n",
       "      <td>2021-12-01</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1200</th>\n",
       "      <td>2021-12-02</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1201</th>\n",
       "      <td>2021-12-03</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>232 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           Date  SecuritiesCode\n",
       "970  2020-12-23            2000\n",
       "971  2020-12-24            2000\n",
       "972  2020-12-25            2000\n",
       "973  2020-12-28            2000\n",
       "974  2020-12-29            2000\n",
       "...         ...             ...\n",
       "1197 2021-11-29            2000\n",
       "1198 2021-11-30            2000\n",
       "1199 2021-12-01            2000\n",
       "1200 2021-12-02            2000\n",
       "1201 2021-12-03            2000\n",
       "\n",
       "[232 rows x 2 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idcount[idcount['SecuritiesCode'] >= 2000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "464000"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idcount[idcount['SecuritiesCode'] >= 2000]['SecuritiesCode'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# filter out data with less than 2000 stock counts in a day\n",
    "# dates before ‘2020-12-23’ all have stock counts less than 2000\n",
    "# This is done to work with consistent data \n",
    "df_prices = df_prices[(df_prices[\"Date\"]>=\"2020-12-23\")]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df_prices = df_prices.reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowId</th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "      <th>Open</th>\n",
       "      <th>High</th>\n",
       "      <th>Low</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>AdjustmentFactor</th>\n",
       "      <th>ExpectedDividend</th>\n",
       "      <th>SupervisionFlag</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>20201223_1301</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1301</td>\n",
       "      <td>2913.0</td>\n",
       "      <td>2920.0</td>\n",
       "      <td>2906.0</td>\n",
       "      <td>2913.0</td>\n",
       "      <td>6300</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.000343</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20201223_1332</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1332</td>\n",
       "      <td>419.0</td>\n",
       "      <td>421.0</td>\n",
       "      <td>416.0</td>\n",
       "      <td>419.0</td>\n",
       "      <td>1413600</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>0.007143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20201223_1333</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1333</td>\n",
       "      <td>2187.0</td>\n",
       "      <td>2195.0</td>\n",
       "      <td>2158.0</td>\n",
       "      <td>2165.0</td>\n",
       "      <td>119000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>0.005051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20201223_1375</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1375</td>\n",
       "      <td>1711.0</td>\n",
       "      <td>1757.0</td>\n",
       "      <td>1701.0</td>\n",
       "      <td>1752.0</td>\n",
       "      <td>446300</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.003484</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20201223_1376</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1376</td>\n",
       "      <td>1589.0</td>\n",
       "      <td>1589.0</td>\n",
       "      <td>1575.0</td>\n",
       "      <td>1586.0</td>\n",
       "      <td>1900</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.009494</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           RowId       Date  SecuritiesCode    Open    High     Low   Close  \\\n",
       "0  20201223_1301 2020-12-23            1301  2913.0  2920.0  2906.0  2913.0   \n",
       "1  20201223_1332 2020-12-23            1332   419.0   421.0   416.0   419.0   \n",
       "2  20201223_1333 2020-12-23            1333  2187.0  2195.0  2158.0  2165.0   \n",
       "3  20201223_1375 2020-12-23            1375  1711.0  1757.0  1701.0  1752.0   \n",
       "4  20201223_1376 2020-12-23            1376  1589.0  1589.0  1575.0  1586.0   \n",
       "\n",
       "    Volume  AdjustmentFactor  ExpectedDividend  SupervisionFlag    Target  \n",
       "0     6300               1.0               NaN            False -0.000343  \n",
       "1  1413600               1.0               NaN            False  0.007143  \n",
       "2   119000               1.0               NaN            False  0.005051  \n",
       "3   446300               1.0               NaN            False -0.003484  \n",
       "4     1900               1.0               NaN            False -0.009494  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_prices.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['RowId', 'Date', 'SecuritiesCode', 'Open', 'High', 'Low', 'Close',\n",
       "       'Volume', 'AdjustmentFactor', 'ExpectedDividend', 'SupervisionFlag',\n",
       "       'Target'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_prices.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#calculate z-scores of `df`\n",
    "z_scores = stats.zscore(df_prices[['Open', 'High', 'Low', 'Close','Volume']], nan_policy='omit')\n",
    "abs_z_scores = np.abs(z_scores)\n",
    "filtered_entries = (abs_z_scores < 3).all(axis=1)\n",
    "df_zscore = df_prices[filtered_entries]\n",
    "df_zscore = df_zscore.reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "df_zscore = df_zscore.reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.01421,
     "end_time": "2022-04-17T07:17:13.396620",
     "exception": false,
     "start_time": "2022-04-17T07:17:13.382410",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "<h1>Feature Engineering\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "tags": [
     "block:feature_engineering",
     "prev:transform_data"
    ]
   },
   "outputs": [],
   "source": [
    "def feat_eng(df, features):\n",
    "\n",
    "    for i in tqdm(range(1, 4)):\n",
    "        # creating lag features\n",
    "        tmp = df[features].shift(i)\n",
    "        tmp.columns = [c + f'_next_shift_{i}' for c in tmp.columns]\n",
    "        df = pd.concat([df, tmp], sort=False, axis=1)\n",
    "\n",
    "    for i in tqdm(range(1, 4)):\n",
    "        df[f'weighted_vol_price_{i}'] = np.log(df[f'Volume_next_shift_{i}'] * df[[col for col in df if col.endswith(f'next_shift_{i}')][:-1]].apply(np.mean, axis=1))\n",
    "       \n",
    "    # feature engineering\n",
    "    df['weighted_vol_price'] = np.log(df['Volume'] * (np.mean(df[features[:-1]], axis=1)))\n",
    "    df['BOP'] = (df['Open']-df['Close'])/(df['High']-df['Low'])\n",
    "    df['HL'] = df['High'] - df['Low']\n",
    "    df['OC'] = df['Close'] - df['Open']\n",
    "    df['OHLCstd'] = df[['Open','Close','High','Low']].std(axis=1)\n",
    "    \n",
    "    feats = df.select_dtypes(include=float).columns\n",
    "    df[feats] = df[feats].apply(np.log)\n",
    "    \n",
    "    # replace inf with nan\n",
    "    df.replace([np.inf, -np.inf], np.nan, inplace=True)\n",
    "    \n",
    "    # datetime features\n",
    "    df['Date'] = pd.to_datetime(df['Date'])\n",
    "    df['Day'] = df['Date'].dt.weekday.astype(np.int32)\n",
    "    df[\"dayofyear\"] = df['Date'].dt.dayofyear\n",
    "    df[\"is_weekend\"] = df['Day'].isin([5, 6])\n",
    "    df[\"weekofyear\"] = df['Date'].dt.weekofyear\n",
    "    df[\"month\"] = df['Date'].dt.month\n",
    "    df[\"season\"] = (df[\"month\"]%12 + 3)//3\n",
    "    \n",
    "    # fill nan values\n",
    "    df = df.fillna(0)\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3/3 [00:00<00:00, 12.99it/s]\n",
      "100%|██████████| 3/3 [02:58<00:00, 59.46s/it]\n",
      "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:30: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.\n"
     ]
    }
   ],
   "source": [
    "new_feats = feat_eng(df_zscore, ['High', 'Low', 'Open', 'Close', 'Volume'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(452481, 41)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_feats.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "new_feats['Target'] = df_zscore['Target']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>RowId</th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "      <th>Open</th>\n",
       "      <th>High</th>\n",
       "      <th>Low</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>AdjustmentFactor</th>\n",
       "      <th>ExpectedDividend</th>\n",
       "      <th>SupervisionFlag</th>\n",
       "      <th>Target</th>\n",
       "      <th>High_next_shift_1</th>\n",
       "      <th>Low_next_shift_1</th>\n",
       "      <th>Open_next_shift_1</th>\n",
       "      <th>Close_next_shift_1</th>\n",
       "      <th>Volume_next_shift_1</th>\n",
       "      <th>High_next_shift_2</th>\n",
       "      <th>Low_next_shift_2</th>\n",
       "      <th>Open_next_shift_2</th>\n",
       "      <th>Close_next_shift_2</th>\n",
       "      <th>Volume_next_shift_2</th>\n",
       "      <th>High_next_shift_3</th>\n",
       "      <th>Low_next_shift_3</th>\n",
       "      <th>Open_next_shift_3</th>\n",
       "      <th>Close_next_shift_3</th>\n",
       "      <th>Volume_next_shift_3</th>\n",
       "      <th>weighted_vol_price_1</th>\n",
       "      <th>weighted_vol_price_2</th>\n",
       "      <th>weighted_vol_price_3</th>\n",
       "      <th>weighted_vol_price</th>\n",
       "      <th>BOP</th>\n",
       "      <th>HL</th>\n",
       "      <th>OC</th>\n",
       "      <th>OHLCstd</th>\n",
       "      <th>Day</th>\n",
       "      <th>dayofyear</th>\n",
       "      <th>is_weekend</th>\n",
       "      <th>weekofyear</th>\n",
       "      <th>month</th>\n",
       "      <th>season</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>20201223_1301</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1301</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>7.979339</td>\n",
       "      <td>7.974533</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>6300</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.000343</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.816919</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.639057</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.743178</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20201223_1332</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1332</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>6.042633</td>\n",
       "      <td>6.030685</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>1413600</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>0.007143</td>\n",
       "      <td>7.979339</td>\n",
       "      <td>7.974533</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>8.748305</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.816919</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.005629</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.609438</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.723459</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>20201223_1333</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1333</td>\n",
       "      <td>7.690286</td>\n",
       "      <td>7.693937</td>\n",
       "      <td>7.676937</td>\n",
       "      <td>7.680176</td>\n",
       "      <td>119000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>0.005051</td>\n",
       "      <td>6.042633</td>\n",
       "      <td>6.030685</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>14.161650</td>\n",
       "      <td>7.979339</td>\n",
       "      <td>7.974533</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>8.748305</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.005629</td>\n",
       "      <td>2.816919</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.963841</td>\n",
       "      <td>-0.519875</td>\n",
       "      <td>3.610918</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.866536</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20201223_1375</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1375</td>\n",
       "      <td>7.444833</td>\n",
       "      <td>7.471363</td>\n",
       "      <td>7.438972</td>\n",
       "      <td>7.468513</td>\n",
       "      <td>446300</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.003484</td>\n",
       "      <td>7.693937</td>\n",
       "      <td>7.676937</td>\n",
       "      <td>7.690286</td>\n",
       "      <td>7.680176</td>\n",
       "      <td>11.686879</td>\n",
       "      <td>6.042633</td>\n",
       "      <td>6.030685</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>14.161650</td>\n",
       "      <td>7.979339</td>\n",
       "      <td>7.974533</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>7.976939</td>\n",
       "      <td>8.748305</td>\n",
       "      <td>2.963841</td>\n",
       "      <td>3.005629</td>\n",
       "      <td>2.816919</td>\n",
       "      <td>3.018705</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>4.025352</td>\n",
       "      <td>3.713572</td>\n",
       "      <td>3.345369</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20201223_1376</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1376</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.362011</td>\n",
       "      <td>7.368970</td>\n",
       "      <td>1900</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>-0.009494</td>\n",
       "      <td>7.471363</td>\n",
       "      <td>7.438972</td>\n",
       "      <td>7.444833</td>\n",
       "      <td>7.468513</td>\n",
       "      <td>13.008747</td>\n",
       "      <td>7.693937</td>\n",
       "      <td>7.676937</td>\n",
       "      <td>7.690286</td>\n",
       "      <td>7.680176</td>\n",
       "      <td>11.686879</td>\n",
       "      <td>6.042633</td>\n",
       "      <td>6.030685</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>6.037871</td>\n",
       "      <td>14.161650</td>\n",
       "      <td>3.018705</td>\n",
       "      <td>2.963841</td>\n",
       "      <td>3.005629</td>\n",
       "      <td>2.702555</td>\n",
       "      <td>-1.540445</td>\n",
       "      <td>2.639057</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.894928</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>20201223_1377</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1377</td>\n",
       "      <td>8.167636</td>\n",
       "      <td>8.173293</td>\n",
       "      <td>8.160518</td>\n",
       "      <td>8.167636</td>\n",
       "      <td>62700</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>0.011252</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.362011</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.368970</td>\n",
       "      <td>7.549609</td>\n",
       "      <td>7.471363</td>\n",
       "      <td>7.438972</td>\n",
       "      <td>7.444833</td>\n",
       "      <td>7.468513</td>\n",
       "      <td>13.008747</td>\n",
       "      <td>7.693937</td>\n",
       "      <td>7.676937</td>\n",
       "      <td>7.690286</td>\n",
       "      <td>7.680176</td>\n",
       "      <td>11.686879</td>\n",
       "      <td>2.702555</td>\n",
       "      <td>3.018705</td>\n",
       "      <td>2.963841</td>\n",
       "      <td>2.955608</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.806662</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.913860</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>20201223_1379</td>\n",
       "      <td>2020-12-23</td>\n",
       "      <td>1379</td>\n",
       "      <td>7.652546</td>\n",
       "      <td>7.656337</td>\n",
       "      <td>7.647786</td>\n",
       "      <td>7.653969</td>\n",
       "      <td>29900</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>False</td>\n",
       "      <td>0.002373</td>\n",
       "      <td>8.173293</td>\n",
       "      <td>8.160518</td>\n",
       "      <td>8.167636</td>\n",
       "      <td>8.167636</td>\n",
       "      <td>11.046117</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.362011</td>\n",
       "      <td>7.370860</td>\n",
       "      <td>7.368970</td>\n",
       "      <td>7.549609</td>\n",
       "      <td>7.471363</td>\n",
       "      <td>7.438972</td>\n",
       "      <td>7.444833</td>\n",
       "      <td>7.468513</td>\n",
       "      <td>13.008747</td>\n",
       "      <td>2.955608</td>\n",
       "      <td>2.702555</td>\n",
       "      <td>3.018705</td>\n",
       "      <td>2.888051</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.890372</td>\n",
       "      <td>1.098612</td>\n",
       "      <td>2.026617</td>\n",
       "      <td>2</td>\n",
       "      <td>358</td>\n",
       "      <td>False</td>\n",
       "      <td>52</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           RowId       Date  SecuritiesCode      Open      High       Low  \\\n",
       "0  20201223_1301 2020-12-23            1301  7.976939  7.979339  7.974533   \n",
       "1  20201223_1332 2020-12-23            1332  6.037871  6.042633  6.030685   \n",
       "2  20201223_1333 2020-12-23            1333  7.690286  7.693937  7.676937   \n",
       "3  20201223_1375 2020-12-23            1375  7.444833  7.471363  7.438972   \n",
       "4  20201223_1376 2020-12-23            1376  7.370860  7.370860  7.362011   \n",
       "5  20201223_1377 2020-12-23            1377  8.167636  8.173293  8.160518   \n",
       "6  20201223_1379 2020-12-23            1379  7.652546  7.656337  7.647786   \n",
       "\n",
       "      Close   Volume  AdjustmentFactor  ExpectedDividend  SupervisionFlag  \\\n",
       "0  7.976939     6300               0.0               0.0            False   \n",
       "1  6.037871  1413600               0.0               0.0            False   \n",
       "2  7.680176   119000               0.0               0.0            False   \n",
       "3  7.468513   446300               0.0               0.0            False   \n",
       "4  7.368970     1900               0.0               0.0            False   \n",
       "5  8.167636    62700               0.0               0.0            False   \n",
       "6  7.653969    29900               0.0               0.0            False   \n",
       "\n",
       "     Target  High_next_shift_1  Low_next_shift_1  Open_next_shift_1  \\\n",
       "0 -0.000343           0.000000          0.000000           0.000000   \n",
       "1  0.007143           7.979339          7.974533           7.976939   \n",
       "2  0.005051           6.042633          6.030685           6.037871   \n",
       "3 -0.003484           7.693937          7.676937           7.690286   \n",
       "4 -0.009494           7.471363          7.438972           7.444833   \n",
       "5  0.011252           7.370860          7.362011           7.370860   \n",
       "6  0.002373           8.173293          8.160518           8.167636   \n",
       "\n",
       "   Close_next_shift_1  Volume_next_shift_1  High_next_shift_2  \\\n",
       "0            0.000000             0.000000           0.000000   \n",
       "1            7.976939             8.748305           0.000000   \n",
       "2            6.037871            14.161650           7.979339   \n",
       "3            7.680176            11.686879           6.042633   \n",
       "4            7.468513            13.008747           7.693937   \n",
       "5            7.368970             7.549609           7.471363   \n",
       "6            8.167636            11.046117           7.370860   \n",
       "\n",
       "   Low_next_shift_2  Open_next_shift_2  Close_next_shift_2  \\\n",
       "0          0.000000           0.000000            0.000000   \n",
       "1          0.000000           0.000000            0.000000   \n",
       "2          7.974533           7.976939            7.976939   \n",
       "3          6.030685           6.037871            6.037871   \n",
       "4          7.676937           7.690286            7.680176   \n",
       "5          7.438972           7.444833            7.468513   \n",
       "6          7.362011           7.370860            7.368970   \n",
       "\n",
       "   Volume_next_shift_2  High_next_shift_3  Low_next_shift_3  \\\n",
       "0             0.000000           0.000000          0.000000   \n",
       "1             0.000000           0.000000          0.000000   \n",
       "2             8.748305           0.000000          0.000000   \n",
       "3            14.161650           7.979339          7.974533   \n",
       "4            11.686879           6.042633          6.030685   \n",
       "5            13.008747           7.693937          7.676937   \n",
       "6             7.549609           7.471363          7.438972   \n",
       "\n",
       "   Open_next_shift_3  Close_next_shift_3  Volume_next_shift_3  \\\n",
       "0           0.000000            0.000000             0.000000   \n",
       "1           0.000000            0.000000             0.000000   \n",
       "2           0.000000            0.000000             0.000000   \n",
       "3           7.976939            7.976939             8.748305   \n",
       "4           6.037871            6.037871            14.161650   \n",
       "5           7.690286            7.680176            11.686879   \n",
       "6           7.444833            7.468513            13.008747   \n",
       "\n",
       "   weighted_vol_price_1  weighted_vol_price_2  weighted_vol_price_3  \\\n",
       "0              0.000000              0.000000              0.000000   \n",
       "1              2.816919              0.000000              0.000000   \n",
       "2              3.005629              2.816919              0.000000   \n",
       "3              2.963841              3.005629              2.816919   \n",
       "4              3.018705              2.963841              3.005629   \n",
       "5              2.702555              3.018705              2.963841   \n",
       "6              2.955608              2.702555              3.018705   \n",
       "\n",
       "   weighted_vol_price       BOP        HL        OC   OHLCstd  Day  dayofyear  \\\n",
       "0            2.816919  0.000000  2.639057  0.000000  1.743178    2        358   \n",
       "1            3.005629  0.000000  1.609438  0.000000  0.723459    2        358   \n",
       "2            2.963841 -0.519875  3.610918  0.000000  2.866536    2        358   \n",
       "3            3.018705  0.000000  4.025352  3.713572  3.345369    2        358   \n",
       "4            2.702555 -1.540445  2.639057  0.000000  1.894928    2        358   \n",
       "5            2.955608  0.000000  3.806662  0.000000  2.913860    2        358   \n",
       "6            2.888051  0.000000  2.890372  1.098612  2.026617    2        358   \n",
       "\n",
       "   is_weekend  weekofyear  month  season  \n",
       "0       False          52     12       1  \n",
       "1       False          52     12       1  \n",
       "2       False          52     12       1  \n",
       "3       False          52     12       1  \n",
       "4       False          52     12       1  \n",
       "5       False          52     12       1  \n",
       "6       False          52     12       1  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_feats.head(7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['RowId', 'Date', 'SecuritiesCode', 'Open', 'High', 'Low', 'Close',\n",
       "       'Volume', 'AdjustmentFactor', 'ExpectedDividend', 'SupervisionFlag',\n",
       "       'Target', 'High_next_shift_1', 'Low_next_shift_1', 'Open_next_shift_1',\n",
       "       'Close_next_shift_1', 'Volume_next_shift_1', 'High_next_shift_2',\n",
       "       'Low_next_shift_2', 'Open_next_shift_2', 'Close_next_shift_2',\n",
       "       'Volume_next_shift_2', 'High_next_shift_3', 'Low_next_shift_3',\n",
       "       'Open_next_shift_3', 'Close_next_shift_3', 'Volume_next_shift_3',\n",
       "       'weighted_vol_price_1', 'weighted_vol_price_2', 'weighted_vol_price_3',\n",
       "       'weighted_vol_price', 'BOP', 'HL', 'OC', 'OHLCstd', 'Day', 'dayofyear',\n",
       "       'is_weekend', 'weekofyear', 'month', 'season'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_feats.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Modelling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "tags": [
     "block:modelling",
     "prev:feature_engineering"
    ]
   },
   "outputs": [],
   "source": [
    "# columns to be used for modelling.\n",
    "feats = ['Date','SecuritiesCode', 'Open', 'High', 'Low', 'Close', 'Volume',\n",
    "         'weighted_vol_price_1', 'weighted_vol_price_2', 'weighted_vol_price_3', \n",
    "       'weighted_vol_price', 'BOP', 'HL', 'OC', 'OHLCstd', 'Day', 'dayofyear',\n",
    "       'is_weekend', 'weekofyear', 'month', 'season']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# transform date to int\n",
    "new_feats['Date'] = new_feats['Date'].dt.strftime(\"%Y%m%d\").astype(int)\n",
    "\n",
    "# split data into valid for validation and train for model training\n",
    "valid = new_feats[(new_feats['Date'] >= 20211111)].copy()\n",
    "train = new_feats[(new_feats['Date'] < 20211111)].copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((421376, 41), (31105, 41))"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.shape, valid.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "papermill": {
     "duration": 10.373257,
     "end_time": "2022-04-17T07:17:23.930551",
     "exception": false,
     "start_time": "2022-04-17T07:17:13.557294",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jovyan/.local/lib/python3.6/site-packages/lightgbm/sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.\n",
      "  _log_warning(\"'verbose' argument is deprecated and will be removed in a future release of LightGBM. \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.015047\n",
      "[LightGBM] [Debug] init for col-wise cost 0.000138 seconds, init for row-wise cost 0.057121 seconds\n",
      "[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.102671 seconds.\n",
      "You can set `force_col_wise=true` to remove the overhead.\n",
      "[LightGBM] [Info] Total Bins 3978\n",
      "[LightGBM] [Info] Number of data points in the train set: 421376, number of used features: 20\n",
      "[LightGBM] [Info] Start training from score 0.000716\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 17\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 19\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 20\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 17\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
      "[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "LGBMRegressor(learning_rate=0.379687157316759, random_state=2022, verbose=2)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# model parameter\n",
    "params = {\n",
    "  'n_estimators': int(N_EST),\n",
    "  'learning_rate': float(LR),\n",
    "    'random_state': 2022,\n",
    "  'verbose' : 2}\n",
    "\n",
    "# model initialization\n",
    "model = LGBMRegressor(**params)\n",
    "\n",
    "\n",
    "X = train[feats]\n",
    "y = train[\"Target\"]\n",
    "\n",
    "X_test = valid[feats]\n",
    "y_test = valid[\"Target\"]\n",
    "\n",
    "# fitting\n",
    "model.fit(X, y, verbose=False, eval_set=(X_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.01428,
     "end_time": "2022-04-17T07:17:23.959655",
     "exception": false,
     "start_time": "2022-04-17T07:17:23.945375",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "<h1> Evaluation and Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "tags": [
     "block:prediction",
     "prev:modelling"
    ]
   },
   "outputs": [],
   "source": [
    "# model prediction\n",
    "preds = model.predict(X_test)\n",
    "\n",
    "# model evaluation\n",
    "rmse = np.round(mean_squared_error(preds, y_test)**0.5, 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "tags": [
     "pipeline-metrics"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.02665\n"
     ]
    }
   ],
   "source": [
    "print(rmse)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Make submission"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "tags": [
     "skip"
    ]
   },
   "outputs": [],
   "source": [
    "sys.path.insert(0, 'helper-files')\n",
    "from local_api import local_api"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "tags": [
     "skip"
    ]
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py:3263: DtypeWarning: Columns (7,8,9,10) have mixed types.Specify dtype option on import or set low_memory=False.\n",
      "  if (await self.run_code(code, result,  async_=asy)):\n",
      "100%|██████████| 3/3 [00:00<00:00, 178.55it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.28it/s]\n",
      "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:30: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.\n",
      "100%|██████████| 3/3 [00:00<00:00, 214.74it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.46it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 278.83it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.38it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 265.66it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.15it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 190.16it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.86it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 218.28it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.64it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 293.18it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  4.37it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 272.18it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.49it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 192.11it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.31it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 274.34it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.99it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 230.67it/s]\n",
      "100%|██████████| 3/3 [00:01<00:00,  2.90it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 260.86it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.86it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 258.50it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.34it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 211.81it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.50it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 231.95it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.23it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 194.62it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.59it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 193.03it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.37it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 224.41it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.60it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 232.46it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.58it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 199.00it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.95it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 277.85it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.78it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 263.43it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.24it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 202.12it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.49it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 251.46it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.45it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 209.96it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.37it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 284.58it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.60it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 224.20it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.25it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 156.18it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.46it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 211.85it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.09it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 207.08it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.75it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 269.89it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.67it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 278.68it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.28it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 248.94it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.33it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 251.54it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.10it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 240.36it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.59it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 254.44it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.19it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 142.48it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.34it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 309.95it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.56it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 314.17it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.10it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 240.49it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.91it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 191.52it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.51it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 176.42it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.28it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 219.95it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.90it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 213.78it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.31it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 194.32it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.59it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 293.83it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.76it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 280.89it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  4.28it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 302.84it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  4.38it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 141.82it/s]\n",
      "100%|██████████| 3/3 [00:01<00:00,  2.92it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 197.56it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.24it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 275.77it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  4.26it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 273.01it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.90it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 305.28it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.45it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 285.51it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.48it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 296.99it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.36it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00, 250.82it/s]\n",
      "100%|██████████| 3/3 [00:00<00:00,  3.79it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.10455440901403816\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>SecuritiesCode</th>\n",
       "      <th>Rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2022-02-28</td>\n",
       "      <td>1301</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2022-02-28</td>\n",
       "      <td>1332</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2022-02-28</td>\n",
       "      <td>1333</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2022-02-28</td>\n",
       "      <td>1375</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2022-02-28</td>\n",
       "      <td>1376</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date  SecuritiesCode  Rank\n",
       "0  2022-02-28            1301     0\n",
       "1  2022-02-28            1332     1\n",
       "2  2022-02-28            1333     2\n",
       "3  2022-02-28            1375     3\n",
       "4  2022-02-28            1376     4"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myapi = local_api('data/supplemental_files')\n",
    "env = myapi.make_env()\n",
    "\n",
    "iter_test = env.iter_test()\n",
    "for (prices, options, financials, trades, secondary_prices, sample_prediction) in iter_test:\n",
    "    prices = feat_eng(prices, ['High', 'Low', 'Open', 'Close', 'Volume'])\n",
    "    prices['Date'] = prices['Date'].dt.strftime(\"%Y%m%d\").astype(int)\n",
    "    prices[\"Target\"] = model.predict(prices[feats])\n",
    "    if prices[\"Volume\"].min()==0:\n",
    "        sample_prediction[\"Prediction\"] = 0\n",
    "    else:\n",
    "        sample_prediction[\"Prediction\"] = prices[\"Target\"]/prices[\"Volume\"]\n",
    "    sample_prediction[\"Prediction\"] = prices[\"Target\"]\n",
    "    sample_prediction.sort_values(by=\"Prediction\", ascending=False, inplace=True)\n",
    "    sample_prediction['Rank'] = np.arange(0,2000)\n",
    "    sample_prediction.sort_values(by = \"SecuritiesCode\", ascending=True, inplace=True)\n",
    "    submission = sample_prediction[[\"Date\",\"SecuritiesCode\",\"Rank\"]]\n",
    "    env.predict(submission)\n",
    "print(env.score())\n",
    "submission.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kubeflow_notebook": {
   "autosnapshot": true,
   "experiment": {
    "id": "new",
    "name": "jpx-tokyo-stock-exchange"
   },
   "experiment_name": "jpx-tokyo-stock-exchange",
   "katib_metadata": {
    "algorithm": {
     "algorithmName": "grid"
    },
    "maxFailedTrialCount": 3,
    "maxTrialCount": 12,
    "objective": {
     "objectiveMetricName": "",
     "type": "minimize"
    },
    "parallelTrialCount": 3,
    "parameters": []
   },
   "katib_run": false,
   "pipeline_description": "JPX Tokyo Stock Exchange Prediction",
   "pipeline_name": "jpx-tokyo-stock-exchange-pipeline",
   "snapshot_volumes": true,
   "steps_defaults": [
    "label:access-ml-pipeline:true",
    "label:kaggle-secret:true",
    "label:access-rok:true"
   ],
   "volume_access_mode": "rwm",
   "volumes": [
    {
     "annotations": [],
     "mount_point": "/home/jovyan",
     "name": "dem-workspace-snqdc",
     "size": 5,
     "size_type": "Gi",
     "snapshot": false,
     "type": "clone"
    }
   ]
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 32.012084,
   "end_time": "2022-04-17T07:17:25.053666",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2022-04-17T07:16:53.041582",
   "version": "2.3.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
